Computers / Programming / Projects / Code Formatter / Code Parser

Parses the code and generates the formatted lines.

Code

CodeParser.h

h type icon
Type: Header file
Language: C++
CodeParser.h

CodeParser.cpp

cpp type icon
Type: Code file
Language: C++
CodeParser.cpp

This class contains the main formatting logic. It loops through the lines of the file and checks for various transitions while generating the formatted text.

The ParseLines(std::string filename, std::string typeExt, int startLine, int endLine) method implements the bulk of the formatting logic. It starts by initializing an instance of the TypeCache and two stacks. One stack is responsible for keeping track of languages while the other is used to keep track of the states. The type specified by typeExt is loaded first. If this type has a language transition that's marked base then that type will get loaded instead although any language transitions defined by the previous type will still be valid. A base language and state object are stored in currentLang and currentState respectively. These also get pushed onto their respective stacks. Next the parser attempts to read the file specified by filename. If opening the file fails an error is printed and an empty list of lines is returned, otherwise a new list of lines is created.

If the file is successfully loaded then the parser starts looping through the lines of the file. If the current state isn't the base state then it gets re-opened before the line processing starts. If the current line number is larger then endLine , the looping ends as no other lines need to be printed.

The parser loops through each character in the line. If a character is unprintable it is ignored. The program also keeps track of the previous character to ensure that formatting isn't applied to part of a larger element.

The parser starts by checking to see if the current character is the start of a start sequence for a language transition. If a match is found then that language transition gets added to the language stack and the type specified by the transition is loaded. The previous current state is closed before a new base state is created and added to the stack. The PrintStart() method of the language transition is called to print the start sequence. This method returns the number of characters printed. If non-zero the length gets added to the current line position and then the loop resets, otherwise processing continues with the current character but with the new type loaded. This is to handle language transitions which have complicated start sequences that have multiple parts. The start sequence triggers the language transition but is formatted by state transitions defined in the new language.

If line processing isn't reset the parser then checks if the current character is the start of the end sequence for the current language transition. If the end sequence matches then the current language transition gets removed from the stack and any open states in that type get removed. The previous type is then reloaded. The PrintEnd() method of the language transition is called to print the end sequence. This method returns the number of characters printed. If non-zero the length gets added to the current line position and then the loop resets, otherwise processing continues with the current character but with the previous language loaded.

If line processing isn't reset and nested states are allowed by the current state, the parser then checks if the current character is the start of the start sequence for a state transition. If a match is found then the previous state is closed and the new state transition is added to the stack. The PrintStart() method of the state transition is called to print the start sequence. This method returns the number of characters printed. If the state is set to end after a word then it is immediately removed from the stack and the previous state restarted. The number of characters printed is added to the current line position and the loop resets.

If line processing isn't reset the parser then checks if the current character is the start of a word from a keyword set. This is done by calling the FindAndPrintSetWord() method on the currently loaded type. This method returns the number of characters printed. If a word was found then this result will be non-zero. If this is the case the number of characters is added to the current line position and the loop resets, otherwise processing continues with the current character.

If line processing isn't reset and the current state is not in the base state, the parser then checks if the current character is the start of the end sequence for the current state. If the end sequence matches then the current state is removed from the stack. The PrintEnd() method of the state transition is called to print the end sequence. This method returns the number of characters printed. The previous one is restarted, if required. The number of characters printed is added to the current line position and the loop resets.

If line processing isn't reset and the current sate is the base state and the previous character wasn't alphanumeric, the parser checks if the current character is the start of a number. This is done by calling the FindAndPrintNumber() method of the currently loaded type. This method returns the number of characters printed. If a number was found then this result will be non-zero. If this is the case the number of characters is added to the current line position and the loop resets, otherwise processing continues with the current character.

Finally if the character is not part of any other sequence it will just be printed to the output. New line characters are ignored and tabs get changed to spaces. The character is escaped before printing to ensure the output is valid HTML.

After a line has finished being parsed the program closes the current state if it's not the base state. It then loops through all current states and removes any from the stack that are marked to end after a line. If the current line number is greater than startLine, the formatted line is added to the lines vector. Otherwise the line is ignored.

After all the required lines have been processed the method returns the lines vector which contains all the lines formatted.

The PopTypeStates(std::vector<State>& currentStates) method loops through a stack containing states and removes them until it finds a state with an id of 0.